New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tm: Cleanup include lookup #1991
Conversation
The whole |
Don't use the files inode as the hash. Although it looks like a good idea for de-duplicating links as well, it has several issues, including non-uniqueness of inodes across file systems. The way it was done hashing the inode but comparing the file name string pointers also made the hash mostly irrelevant, as it just stored filenames sharing the same inode in the same hash bucket but without actually doing any de-duplication, making the whole thing a convoluted way of converting to a list. Instead, hash and compare the filenames themselves, which, even though it doesn't handle links de-duplication, is better than the non-functional previous code. Also, directly build the list and only use the hash table as a way for checking for duplicates, which is both faster and gives a stable output.
08a3892
to
fc6a9bb
Compare
Oops, that's what you get for making changes and committing in a hurry. |
There is 6-9 lines of code duplication around the |
@bmwiedemann what do you mean, between the glob and non-glob versions? I don't really mind given how the glob conditional is not trivial. |
998760c
to
8b68c5a
Compare
I just added an extra 2 things here:
|
LGBI Is the runner.sh change really part of this, or is it a general change that possibly should be separate? |
Well, it's not really specific to this, but it's needed for this test case because it requires parsing more than one input file at once, which the current automated setup doesn't allow. So yeah, it can be used by more test cases in theory, but in practice until now we didn't have a use case. |
Thats fine then. |
Process files in the order they appear on the command line when generating tags file, instead of a more or less random order. Closes #1989.
Don't use the files inode as the hash. Although it looks like a good idea for de-duplicating links as well, it has several issues, including non-uniqueness of inodes across file systems.
The way it was done hashing the inode but comparing the file name string pointers also made the hash mostly irrelevant, as it just stored filenames sharing the same inode in the same hash bucket but without
actually doing any de-duplication, making the whole thing a convoluted way of converting to a list.
Instead, hash and compare the filenames themselves, which, even though it doesn't handle links de-duplication, is better than the non-functional previous code.
Also, directly build the list and only use the hash table as a way for checking for duplicates, which is both faster and gives a stable output.
See #1989